Textline detection in degraded historical document images

نویسندگان

  • Byeongyong Ahn
  • Jewoong Ryu
  • Hyung Il Koo
  • Nam Ik Cho
چکیده

This paper presents a textline detection method for degraded historical documents. Our method follows a conventional two-step procedure that the binarization is first performed and then the textlines are extracted from the binary image. In order to address the challenges in historical documents such as document degradation, structure noise, and skews, we develop new methods for the binarization and textline extraction. First, we improve the performance of binarization by detecting the non-text regions and processing only text regions. We also improve the textline detection method by extracting main textblock and compensating the skew angle and writing style. Experimental results show that the proposed method yields the state-of-the-art performance for several datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ridges Based Curled Textline Region Detection from Grayscale Camera-Captured Document Images

As compared to scanners, cameras offer fast, flexible and non-contact document imaging, but with distortions like uneven shading and warped shape. Therefore, camera-captured document images need preprocessing steps like binarization and textline detection for dewarping so that traditional document image processing steps can be applied on them. Previous approaches of binarization and curled text...

متن کامل

Interactive degraded document enhancement and ground truth generation

Degraded documents are frequently obtained in various situations. Examples of degraded document collections include historical document depositories, document obtained in legal and security investigations, and legal and medical archives. Degraded document images are hard to to read and are hard to analyze using computerized techniques. There is hence a need for systems that are capable of enhan...

متن کامل

Restoration of Degraded Historical Document Image: An Adaptive Multilayer-Information Binarization Technique

Binary image is the essential format for document image processing, and the operation of the subsequent steps depends on the quality of the binarization process. The objective of this research is to propose a new binarization method based on adaptive multilayer-information for restoration of degraded historical document images. This paper focuses on degraded Thai historical document images, whi...

متن کامل

Restoration of Degraded Historical Document Image

Restoration plays a very important role in enhancing the degraded noisy images. To enhance the degraded image, the numerous algorithms have been designed. Since image processing algorithms are subjective, not all algorithms that developed will address all type of degradedness. To address specific type of problem the suitable algorithms need to be selected. In this paper a combination of spatial...

متن کامل

Binarization of Document Image

Documents Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR). Though document image binarization has been studied for many years, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Image and Video Processing

دوره 2017  شماره 

صفحات  -

تاریخ انتشار 2017